Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[api] delete file #1122

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open

[api] delete file #1122

wants to merge 20 commits into from

Conversation

MalinAhlberg
Copy link
Contributor

@MalinAhlberg MalinAhlberg commented Nov 7, 2024

Related issue(s) and PR(s)
This PR closes #1134 .

Description
This PR adds the delete file functionality to the api component. Specifically, it deletes the file from the inbox and it adds a new file log event, setting the file status to disabled.

Also, it adds the fileID to the list functionality of the api, since that field is needed in order to delete a file:

curl -H "Authorization: Bearer $token" "http://localhost:8090/users/[email protected]/files" | jq .
[
  {
    "fileID": "d32f7117-bb85-40e2-9c52-101bf9c1ca5a",
    "inboxPath": "test_dummy.org/race_file.c4gh",
    "fileStatus": "ready",
    "createAt": "2024-11-19T10:50:07.836513Z"
  },
  {...

How to test
make build-all then PR_NUMBER=$(date +%F) docker compose -f .github/integration/sda-s3-integration.yml run integration_test.
List the files (eg with http://localhost:8090/users/[email protected]/files) and make sure files in the inbox can be deleted, and that archived files can not be deleted.

@MalinAhlberg MalinAhlberg force-pushed the feature/api-delete-file branch 8 times, most recently from b31bc38 to 446a1ff Compare November 21, 2024 10:22
@MalinAhlberg MalinAhlberg marked this pull request as ready for review November 21, 2024 10:22
@MalinAhlberg MalinAhlberg requested a review from a team November 21, 2024 10:25
Copy link
Contributor

@kostas-kou kostas-kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job. I have only some minor comments

sda/cmd/api/api.go Outdated Show resolved Hide resolved
sda/internal/database/database.go Show resolved Hide resolved
.github/integration/tests/sda/60_api_admin_test.sh Outdated Show resolved Hide resolved
sda/cmd/api/api.md Show resolved Hide resolved
@MalinAhlberg
Copy link
Contributor Author

MalinAhlberg commented Nov 27, 2024

Great comments @kostas-kou ! Fixed most of them in 92903a6, but left this one for others to see <- solved during stand-up.

kostas-kou
kostas-kou previously approved these changes Nov 28, 2024
pahatz
pahatz previously approved these changes Nov 28, 2024
Copy link
Contributor

@pahatz pahatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Especially on the tests, very extensive.
I don't have any significant remark on the PR.

sda/cmd/api/api.go Outdated Show resolved Hide resolved
sda/internal/database/db_functions.go Show resolved Hide resolved
@MalinAhlberg
Copy link
Contributor Author

@kostas-kou and @pahatz, thanks for your reviews! I have fixed the weird comment, rebased on main and also rebased to get rid of the fixup-commits. Only 3ac7e5e is new, the rest is the same as when you reviewed.

@MalinAhlberg
Copy link
Contributor Author

...and added 8a745c4 for the rbac

pahatz
pahatz previously approved these changes Nov 29, 2024
aaperis
aaperis previously approved these changes Dec 2, 2024
Copy link
Contributor

@aaperis aaperis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't tested, but it looks good. Tiny minor comments :-)

Comment on lines +44 to +80
last_event=$(psql -U postgres -h postgres -d sda -At -c "SELECT event FROM sda.file_event_log WHERE file_id='$fileid' order by started_at desc limit 1;")

if [ "$last_event" != "disabled" ]; then
echo "The file $fileid does not have the expected las event 'disabled', but $last_event."
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it is too much, but shouldn't the test check at this point that the file was deleted from the s3 bucket as well?

Copy link
Contributor Author

@MalinAhlberg MalinAhlberg Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes... it shoud. Fixed in d690e54

- `/file/:username/*fileid`
- accepts `DELETE` requests
- marks the file as `disabled` in the database, and deletes it from the inbox.
- The file is identfied by its id, returned by `users/:username/:files`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The file is identfied by its id, returned by `users/:username/:files`
- The file is identified by its id, returned by `users/:username/:files`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 109e7e2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In d690e54

}

submissionUser := c.Param("username")
log.Warn("submission user:", submissionUser)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why log.Warn and not log.Debug or log.Info?

Copy link
Contributor Author

@MalinAhlberg MalinAhlberg Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason 😬 ... Changed in d690e54


fileID := c.Param("file")
fileID = strings.TrimPrefix(fileID, "/")
log.Warn("submission file:", fileID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why log.Warn?

Copy link
Contributor Author

@MalinAhlberg MalinAhlberg Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason. Changed in d690e54

Copy link
Collaborator

@jbygdell jbygdell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More specific comments are coming

@MalinAhlberg MalinAhlberg force-pushed the feature/api-delete-file branch 2 times, most recently from 109e7e2 to d690e54 Compare December 3, 2024 07:43
Copy link
Collaborator

@jbygdell jbygdell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a few things here that we need to think more about.

  • In FEGA the submitter can trigger ingestion of an uploaded file. That means we also needs to be able to remove a file both from the inbox and the archive during a cleaning operation. As well as the backup site if that is configured.
  • The API can also be used by the submitter and that means we can't use the same functions or structs sometimes as that might leak things the submitter should not be able to see.

.github/integration/sda/rbac.json Outdated Show resolved Hide resolved
sda/cmd/api/api.go Show resolved Hide resolved
.github/integration/sda/rbac.json Outdated Show resolved Hide resolved
@@ -100,6 +106,7 @@ func setup(config *config.Config) *http.Server {
r.POST("/c4gh-keys/add", rbac(e), addC4ghHash) // Adds a key hash to the database
r.GET("/c4gh-keys/list", rbac(e), listC4ghHashes) // Lists key hashes in the database
r.POST("/c4gh-keys/deprecate/*keyHash", rbac(e), deprecateC4ghHash) // Deprecate a given key hash
r.DELETE("/file/:username/*file", rbac(e), deleteFile) // Delete a file from inbox
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The *file here will catch a full path i.e. /path/to/file.c4gh. If we want' that semantics we should probably call it filePath.

If we not intend to use the file path but instead use the file id (to cope with BPs expected upload structure) the *file should be :file or :fileID as we only expect a single value (no slashes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It's the fileID that should be there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 0fa47f5

sda/cmd/api/api.go Outdated Show resolved Hide resolved
sda/cmd/api/api.go Outdated Show resolved Hide resolved
sda/cmd/api/api.go Outdated Show resolved Hide resolved
.github/integration/tests/sda/60_api_admin_test.sh Outdated Show resolved Hide resolved
sda/internal/database/database.go Show resolved Hide resolved
Comment on lines 88 to 90
"AND EXISTS (SELECT 1 FROM " +
"(SELECT event from sda.file_event_log where file_id = $2 order by started_at desc limit 1) " +
"as subquery WHERE event in ('registered', 'uploaded', 'submitted', 'ingested', 'error'))"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a bit iffy since in FEGA if the submission user adds a file to a run by mistake that file will be ingest, archived, backed up and given a stable ID (a case we actually will ned to be able to handle).
If we want to limit this to files that only have been uploaded to the inbox ingested can not be a valid event since then the file will also be in the archive.

Instead we probably need to check if the file has had a state of ingested, verified, archived or ready and then also remove it from the archive. If the file has had the state backed up we also need to remove the file from the backup site.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was decided that this issue should not handle deleting files from the archive.
So we should remove ingested from the event list here..?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we focus on files that are strictly uploaded only 'uploaded' would be the available option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest commit.

@MalinAhlberg
Copy link
Contributor Author

Thanks @jbygdell for the comments! I will fix the trivial ones, and have answered to some. The more complex ones (these) should be dicussed in the team/with the PO.

sda/cmd/api/api.md Outdated Show resolved Hide resolved
sda/cmd/api/api.go Outdated Show resolved Hide resolved
@MalinAhlberg
Copy link
Contributor Author

Updated, fixed, rebased etc. Please re-review.

fi

# Try to delete an unknown file
resp="$(curl -s -k -L -o /dev/null -w "%{http_code}\n" -H "Authorization: Bearer $token" -H "Content-Type: application/json" -X DELETE "http://api:8080/file/[email protected]/badfileid")"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
resp="$(curl -s -k -L -o /dev/null -w "%{http_code}\n" -H "Authorization: Bearer $token" -H "Content-Type: application/json" -X DELETE "http://api:8080/file/[email protected]/badfileid")"
resp="$(curl -s -k -L -o /dev/null -w "%{http_code}\n" -H "Authorization: Bearer $token" -X DELETE "http://api:8080/file/[email protected]/badfileid")"


# Try to delete file of other user
fileid="$(curl -k -L -H "Authorization: Bearer $token" -H "Content-Type: application/json" "http://api:8080/users/[email protected]/files" | jq -r '.[0]| .fileID')"
resp="$(curl -s -k -L -o /dev/null -w "%{http_code}\n" -H "Authorization: Bearer $token" -H "Content-Type: application/json" -X DELETE "http://api:8080/file/[email protected]/$fileid")"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
resp="$(curl -s -k -L -o /dev/null -w "%{http_code}\n" -H "Authorization: Bearer $token" -H "Content-Type: application/json" -X DELETE "http://api:8080/file/[email protected]/$fileid")"
resp="$(curl -s -k -L -o /dev/null -w "%{http_code}\n" -H "Authorization: Bearer $token" -X DELETE "http://api:8080/file/[email protected]/$fileid")"

exit 1
fi

fileid="$(curl -k -L -H "Authorization: Bearer $token" -H "Content-Type: application/json" "http://api:8080/users/[email protected]/files" | jq -r '.[] | select(.inboxPath == "test_dummy.org/NE12878.bam.c4gh") | .fileID')"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fileid="$(curl -k -L -H "Authorization: Bearer $token" -H "Content-Type: application/json" "http://api:8080/users/[email protected]/files" | jq -r '.[] | select(.inboxPath == "test_dummy.org/NE12878.bam.c4gh") | .fileID')"
fileid="$(curl -k -L -H "Authorization: Bearer $token" "http://api:8080/users/[email protected]/files" | jq -r '.[] | select(.inboxPath == "test_dummy.org/NE12878.bam.c4gh") | .fileID')"

Comment on lines +142 to +150
until [ "$(psql -U postgres -h postgres -d sda -At -c "select id from sda.file_events e where e.title in (select event from sda.file_event_log where file_id = '$fileid' order by started_at desc limit 1);")" -gt 30 ]; do
echo "waiting for ingest to complete"
RETRY_TIMES=$((RETRY_TIMES + 1))
if [ "$RETRY_TIMES" -eq 30 ]; then
echo "::error::Time out while waiting for upload to complete"
exit 1
fi
sleep 2
done
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since ingest automatically triggers a verification event why not use that as a watcher, like we do in 20_ingest-verify_test

echo "waiting for verify to complete"
RETRY_TIMES=0
until [ "$(curl -su guest:guest http://rabbitmq:15672/api/queues/sda/verified/ | jq -r '.messages_ready')" -eq 4 ]; do
    echo "waiting for verify to complete"
    RETRY_TIMES=$((RETRY_TIMES + 1))
    if [ "$RETRY_TIMES" -eq 30 ]; then
        echo "::error::Time out while waiting for verify to complete"
        exit 1
    fi
    sleep 2
done


# Try to delete file not in inbox
fileid="$(curl -k -L -H "Authorization: Bearer $token" -H "Content-Type: application/json" "http://api:8080/users/[email protected]/files" | jq -r '.[] | select(.inboxPath == "test_dummy.org/NE12878.bam.c4gh") | .fileID')"
resp="$(curl -s -k -L -o /dev/null -w "%{http_code}\n" -H "Authorization: Bearer $token" -H "Content-Type: application/json" -X DELETE "http://api:8080/file/[email protected]/$fileid")"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
resp="$(curl -s -k -L -o /dev/null -w "%{http_code}\n" -H "Authorization: Bearer $token" -H "Content-Type: application/json" -X DELETE "http://api:8080/file/[email protected]/$fileid")"
resp="$(curl -s -k -L -o /dev/null -w "%{http_code}\n" -H "Authorization: Bearer $token" -X DELETE "http://api:8080/file/[email protected]/$fileid")"

fileID = strings.TrimPrefix(fileID, "/")
log.Debug("submission file:", fileID)
if fileID == "" {
c.AbortWithStatusJSON(http.StatusBadRequest, "file ID is required")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
c.AbortWithStatusJSON(http.StatusBadRequest, "file ID is required")
c.AbortWithStatusJSON(http.StatusBadRequest, "file ID is required")
return

submissionUser := c.Param("username")
log.Debug("submission user:", submissionUser)

fileID := c.Param("file")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fileID := c.Param("file")
fileID := c.Param("fileid")


fileID := c.Param("file")
fileID = strings.TrimPrefix(fileID, "/")
log.Debug("submission file:", fileID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.Debug("submission file:", fileID)
log.Debug("submission fileID:", fileID)

Comment on lines +323 to +338
var RetryTimes = 5
for count := 1; count <= RetryTimes; count++ {
log.Warn("trying to remove file from inbox, try", count)
err = inbox.RemoveFile(filePath)
if err != nil {
log.Errorf("Remove file from inbox failed, reason: %v", err)
}

// The GetFileSize returns zero in case of error after retrying a number of times
fileSize, _ := inbox.GetFileSize(filePath)
if fileSize == 0 {
break
}

time.Sleep(time.Duration(math.Pow(2, float64(count))) * time.Second)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var RetryTimes = 5
for count := 1; count <= RetryTimes; count++ {
log.Warn("trying to remove file from inbox, try", count)
err = inbox.RemoveFile(filePath)
if err != nil {
log.Errorf("Remove file from inbox failed, reason: %v", err)
}
// The GetFileSize returns zero in case of error after retrying a number of times
fileSize, _ := inbox.GetFileSize(filePath)
if fileSize == 0 {
break
}
time.Sleep(time.Duration(math.Pow(2, float64(count))) * time.Second)
}
for count := 1; count <= 5; count++ {
err := inbox.RemoveFile(filePath)
if err == nil {
break
}
if count == 5 {
log.Errorf("remove file from inbox failed, reason: %v", err)
c.AbortWithStatusJSON(http.StatusInternalServerError, ("remove file from inbox failed")
return
}
time.Sleep(time.Duration(math.Pow(2, float64(count))) * time.Second)
}

sda/cmd/api/api.go Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[api] delete files from inbox
7 participants